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Abstract 

Background: Grasspea (Lathyrus sativus L, 2n = 14), a member of the family Leguminosae, holds great agronomic 
potential as grain and forage legume crop in the arid areas for its superb resilience to abiotic stresses such as 
drought, flood and salinity. The crop could not make much progress through conventional breeding in the past, 
and there are hardly any detailed molecular biology studies due to paucity of reliable molecular markers 
representative of the entire genome. 

Results: Using the 454 FLX Titanium pyrosequencing technique, 651,827 simple sequence repeat (SSR) loci were 
identified and 50,144 nonredundant primer pairs were successfully designed, of which 288 were randomly selected 
for validation among 23 L sativus and one L cicera accessions of diverse provenance. 74 were polymorphic, 70 
monomorphic, and 144 with no PCR product. The number of observed alleles ranged from two to five, the 
observed heterozygosity from 0 to 0.9545, and Shannon's information index ranged from 0.1013 to 1.0980, 
respectively. The dendrogram constructed by using unweighted pair group method with arithmetic mean (UPGMA) 
based on Nei's genetic distance, showed obvious distinctions and understandable relationships among the 24 
accessions. 

Conclusions: The large number of SSR primer pairs developed in this study would make a significant contribution 
to genomics enabled improvement of grasspea. 
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Background L-a,|3-diaminopropionic acid (|3-ODAP), limit its culti- 

Grasspea {Lathyrus sativus L.) is an excellent candidate vation under various agro-ecological conditions [4-6]. 
crop to provide protein and starch for human diets and To date, less than 205 microsatellite (SSR) markers have 

animal feeds in the arid areas [1]. It is one of the hardi- been published for grasspea, and only 61 of them were 

est crops for adaptation to climate change because of characterized for size polymorphism [7-9]. Lioi et al, [7] 

its ability to survive drought, flood and salinity [2] . It searched for the presence of SSRs with the European Mo- 

also plays a vital role in many low input farming sys- lecular Biology Laboratory (EMBL) nucleotide sequence 

terns [3]. However, undesirable features such as pros- database. Ten out of 20 SSR primers were successfully 

trate plant habit, indeterminate growth, pod shattering, amplified, and only six of them exhibited size polymorph- 

later maturity and presence of neurotoxin, (3-N-oxalyl- ism. In addition, Ponnaiah et al., [8] searched for EST- 

SSRs in the National Center for Biotechnology Informa- 
tion (NCBI) database. Seven of the 19 Lathyrus EST-SSRs 
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300 EST-SSR and designed primers to characterize for 
size polymorphism among 24 grasspea accessions. Among 
them 44 SSR markers were polymorphic, 117 markers 
monomorphic and 139 markers with no bands [9]. Lioi se- 
quenced 400 randomly selected clones and get 119 retriev- 
ing SSR containing sequences. 7 primer pairs produced 
clearly distinguishable DNA banding patterns in 10 ran- 
domly selected SSRs, The transferability of SSR markers 
was high among three related species of Lathyrus, namely 
Lathyrus cicera, Lathyrus ochrus and Lathyrus tingitanus, 
and the legume crop, Pisum sativum [10]. 

Next generation sequencing (NGS) technologies has 
become popular on its success of sequencing DNA at 
unprecedented speed thereby enabling impressive scien- 
tific achievements and novel biological applications 
[11,12]. Next generation RNA sequencing (RNA-Seq) is 
rapidly replacing microarrays as the technology of choice 
for whole-transcriptome studies [13]. RNA-Seq also 
provides a far more precise measurement of levels of 
transcripts and their isoforms than other methods [14]. 
However, few studies solely focused on high-throughput 
novel microsatellite markers discovery of orphan crops 
via next generation sequencing [15-19]. 

Recently, we applied next generation sequencing to 
obtain high-quality putative SSR loci and flanking primer 
sequences inexpensively and efficiently. The novel SSR 
sequences were characterized and validated through suc- 
cessful amplification of randomly selected primer pairs 
across a selection of 23 grasspea accessions and one acces- 
sion of its direct ancestor red pea {Lathyrus cicera) as an 
outgroup. 

Methods 

Plant material 

Eight grasspea (L. sativus) accessions consisted of two 
Chinese, two Asian, one African and three European 
accessions were used for the 454 sequencing. 

A set of 23 grasspea (L. sativus) accessions and one 
red pea (L. cicera) accession were used in SSR marker 
testing and genetic diversity analysis. These genetic 
resources contained six accessions from China, seven 
each from Asia (including one L. cicera accession) and 
Europe, and four from Africa. 

The seed samples were obtained from the National 
Genebank of China at Institute of Crop Science, Chinese 
Academy of Agricultural Sciences, Beijing, China. Details 
information is given in Additional file 1: Table SI. 

DNA isolation, library preparation and 454 sequencing 

The sprouts from each of the eight genotypes were 
collected and total genomic DNA was isolated using the 
CTAB method from the seven-day old seedlings grown 
under dark condition at 18°C. A selective hybridization with 



streptavidin coated bead method was used to construct 
SSR-enriched genomic libraries. The following eight probes 
were used: p(AC) 10 , p(GA) 10 , p(AAC) 8 , p(AAG) 8 , p(AAT) 8 , 
p(ATGT) 6 , p(GATA) 6 and p(AAAT) 6 . Libraries quality con- 
trol was conducted by randomly selecting and sequencing 
186 clones. The DNA fragments were inserted into pGEM- 
T EASY vector, and insert fragments were validated by 
Sanger sequencing. If the libraries had high ratio of insert 
fragments and most fragments length were from 500 to 
800, they were considered as high quality. 

The eight SSR-enriched DNA libraries were equally 
pooled for pyrosequencing using the 454 Genome Se- 
quencer FLX Titanium System at Beijing Autolab Biotech- 
nology Co. Ltd (China). Finally, the 454 System collected 
the data and generated standard flow gram file (.sff ) which 
contained raw data for all the reads. Then, grasspea.sff file 
was submitted to the sequence read archive (SRA) at the 
National Center for Biotechnology Information (NCBI) 
with the accession number SRX272771. 

Reads characterization 

All high quality reads were processed to remove adaptor- 
ligated regions using the Vectorstrip program in EMBOSS 
software package [20]. Moreover, in-house developed pro- 
gram such as: SeqTools.pl, ACGT.pl, ave_length.pl, and 
max.pl programs were used to analyze the total number 
of nucleotide A, T, C, G in all reads, the average length 
of all read sequences, and the maximum length read in 
our study. 

SSRs searching 

Before SSRs searching, "clean reads" were filtered redun- 
dant at 98% sequence identity, using CD-HIT program 
(http://weizhong-lab.ucsd.edu/cd-hit/). A high-throughput 
SSR search was performed using MIS A (Microsatellite 
identification) tool (http://pgrc.ipk-gatersleben.de/misa/). 
The parameters were as following: minimum SSR motif 
length of 10 bp and repeat length of mono- 10, di-6, tri-5, 
tetra-5, penta-5, and hexa-5. The maximum size of inter- 
ruption allowed between two different SSR in a compound 
sequence was 100 bp. 

SSR characterization 

The MISA file was used to analyse the number of se- 
quences containing SSRs, the number of SSRs detected, 
the number of SSRs starting within 200 bp of read se- 
quences, the dominant types of SSR motifs within mono-, 
di-, tri-, tetra-, penta- and hexa- repeats, and the ratio of 
single, perfect compound and interrupted compound 
SSRs. These characterizations were obtained by statistical 
analysis from the MISA files [21] by a small Perl program 
and plotted by R language [22], and OpenOffice.org Calc. 
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Primer pairs designing 

Primer pairs were designed by Primer 3.0 interface mod- 
ules containing p3_in.pl Primer 3.0 [23] and p3_out.pl 
files (http://pgrc.ipk-gatersleben.de/misa/primer3.html). 
These Perl scripts were used to normalize the format in 
order to design primers flanking the microsatellite locus. 
Amplification product sizes ranged from 100 to 300 bp. 
Then, the in-house developed script primer_random_- 
pick.pl was used to gain the non-redundant primers. 

Polymerase chain reactions (PCR) amplification 

For each of primer pair, PCRs were performed twice, 
each time with a different Taq enzyme and reaction buf- 
fer. All the primer pairs were amplified in the first round 
experiment with 20 \A reaction volumes containing 0.5 U 
of TaKaRa Taq polymerase (Code No.: R001A, TaKaRa, 
Dalian, China), 2 ul of 10 x PCR Buffer (Mg 2+ plus), 0.2 ul 
of dNTP (2.5 mM each), 0.4 uM primer, and 50 ng of 
genomic DNA. Then the no bands or weak bands primers 
were used in the second round PCR reaction using 
TAKaRa LA Taq polymerase with GC buffer (Code No.: 
RR02AG, TaKaRa, Dalian, China) according to the manu- 
facturers instructions. SSRs were amplified on Heijingang 
Thermal Cycler (Eastwin, Beijing, China). Under the fol- 
lowing conditions: 5 min initial denaturation at 95°C; 
35 cycles of 30 s at 95°C, 30 s at the optimized annealing 
temperature (Table 1), 45 s of elongation at 72°C, and a 
final extension at 72°C for 10 min. PCR products were 
tested for polymorphism using 6% denaturing polyacryl- 
amide gels and visualized by silver nitrate staining. 

Evaluation of polymorphic primers in different accessions 

288 SSR markers were randomly selected for validation 
feasibility and size polymorphism among 23 grasspea 
(L. sativus) genotypes from diverse geographical locations 
and one red pea (L. cicera) genotype. POPGEN1.32 [24] 
software was used to calculate the observed number of 
alleles (Na), the level of observed heterozygosity (Ho) and 
the Shannons information index (7). 

Genetic diversity analysis 

Cluster analysis was conducted based on Neis [25] un- 
biased genetic distance, by using POPGEN1.32 [24] soft- 
ware with the unweighted pair group method on arithmetic 
averages (UPGMA) algorithm. The resulting clusters were 
expressed as a dendrogram drawn by MEGA4 [26]. 

Results 

Quality control during library construction 

The quality of SSR enriched grasspea library was inspected 
by sequencing 186 randomly selected clones. The result- 
ing data verified that, the recombination rate was 95%, 
and 29 sequences contained 89 SSR motifs within the 
cloned sequences. 



454 sequencing and characterization reads 

A total of 493,364 reads were generated from the Roche 
454 GS FLX Titanium platform. After adaptor removing, 
370,079 read sequences were used for further analysis. 
The most common nucleotide was thymidine, according 
for 27.7% of total nucleotides, followed by adenosine 
(27.2%), guanine (22.2%) and cytosine (22.1%). The mean 
GC content was 44.3%. The average length of read se- 
quence was 453 bp, with a maximum length of 1,162 bp 
(Figure 1). 

Mining for SSRs (simple sequence repeats) 

Firstly, we employed the program CD-HIT (http://weiz- 
hong-lab.ucsd.edu/cd-hit/) to produce a set of 280,791 
non-redundant representative sequences. Then, Micro- 
satellite identification tool (MISA) (http://pgrc.ipk-gate- 
rsleben.de/misa/) was used for microsatellite mining. As 
a result, 651,827 SSRs were identified in 129,886 read se- 
quences. Among them, 115,172 read sequences contained 
more than one SSR. The number of SSRs presenting in 
compound formation was 464,271 (Table 2), which meant 
high proportion of SSR loci (71.2%) was located within 
compound repeats. The majority of identified SSRs 
(65.4%) were located within 200 bp from the 5 '-terminus, 
and few of SSRs fell into the 3 '-terminus (Figure 2). 

SSR motifs characterizing 

The identified SSRs included 995 (0.2%) mononucleotide 
repeat motifs, 385,385 (59.1%) dinucleotide repeat motifs, 
238,752 (36.6%) trinucleotide repeat motifs, 21,200 (3.3%) 
tetranucleotide repeat motifs, 2,911 (0.4%) pentanucleo- 
tide repeat motifs, and 2,584 (0.4%) hexanucleotide repeat 
motifs (Figure 3). Thus over 95% of the motifs were di- 
and tri-nucleotides. The most abundant repeat motif 
type was (AC/GT)n, followed by (AAC/GTT)n, (AG/ 
CT)n, (ACG/CTG)n and (ACGT/ATGC)n, respectively 
(Additional file 2: Figure SI, Additional file 3: Figure S2, 
Additional file 4: Figure S3, Additional file 5: Figure S4, 
Additional file 6: Figure S5, Additional file 7: FigureS6). 

Compound SSR analysis 

In our study, perfect SSRs (i.e., (CA) 8 which were named 
as P2 type) were relatively less frequent (29.4%) than the 
compound SSRs (70.6%). In addition, there were two kinds 
of compound SSRs: those with interruption between two 
motifs (i.e., (CT) 8 cacacg(CA) 9 which were named as C 
type); and those without interruption between two motifs 
(i.e., (GT) 6 (GTC) 6 which were named as C* type). There 
were 123,444C type (93.2%) and 8,989C* type (6.8%) com- 
pound SSRs detected, which suggested the complexity of 
the grasspea genome. 
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Table 1 Characteristics of 74 polymorphic microsatellite loci developed in grasspea (FP = forward primer, RP = reverse 
primer, Ta = annealing temperature) 



Primer 

G1 

G4 
G5 
G6 
G7 
G9 
G13 
G15 
G17 
G18 
G27 
G33 
G39 
G49 
G61 
G64 
G67 
G68 
G72 
G73 
G75 



Repeat motif 

(A) 10 

(AAAG)5 
(AAQ10 
(AAQ12 
(AAC)5 
(AAQ6 
(AACCA)5 
(AAG)6 
(A AT) 5 
(AC) 10 
(AC) 18 

(AQ6 

(AC)6 

(AC) 7 

(AQ8 

(AQ8 

(AC) 9 

(AC)9 
(ACA)5 
(ACA)5 
(ACA)6 



Primer sequence (5-3') 

FP-AAGGAGCAGCAGCAmG^ 
RP-TAATAATGGGGAGCCGATCA 
FP- CCmCGGAGCAATCAAGAC 
RP- TGCCTAAGCATOGCmCT 
FP- CACAACCAGTOCATCAGTG 
RP- TGGCTCACATGATGGmGT 
FP- TGGAGGACGAGCAACAATAA 
RP- TGTOTOATGGAAACAAATGA 
FP- ACAGCAAGAAGCAGCAACAG 
RP- AGTTGGTTGTTGJGJCGJTGJ 
FP- CAACCAGAGCAACCACAAGA 
RP- GGTOCAAGAGGTOCAGAT 
FP-CAAACCAAACCAAACCAAACT 
RP- CGCG^GG^CGTACT 
FP- TCAAGCCCAAAGTGAGATGA 
RP- ^GTGTOOTGCTGACC 
FP- CAGGTCCGGCTOTCTCTCA 
RP- TOGmCAACCCACTCCTC 
FP- ACACGACACACACGACAGTG 
RP- CTGCGTGTCTGTGCCTATO 
FP- ATCTOCCGGGGATCCATO 
RP- OTCCCCATCTCTGGTG^ 
FP- ACCAAAGGATGCAGGGTCTA 
RP- TAGTCGTGGTGTCGTGGTGT 
FP- CCAGACACACACGCAAACAT 
RP- GTGTGTGACGTOCCGTOG 
FP-ACGCACACACGGAAGAAAG 
RP-GTGTGCGCATGTGTGTATGA 
FP-CACACACCATOCGCACACA 
RP-TGGTGTCGTGGTCGTAGGTA 
FP-GCACATOGCACGTATOAC 
RP-CGmCTGAGTGCGTOTGT 
FP-CACCCTCTOACTGCCTAGC 
RP-TOGGGGTOTAGAAGGAAC 
FP-GCACACAAGGGCACACTG 
RP-TGCGTCGTGTGTATGTGTO 
FP-CAACGACAACAACGCAAAAC 
RP-TOGCGGmGTCCAmAG 
FP-CCAACTCTCAGCCACGAACT 
RP-TOCTCCACCTACGCTO^ 
FP-AACAACAGCAGCAACAACAAT 
RP-CGTGTOTGTGTOGTOGTA 



Real product size(bp) Ta/°C 

210-240 52 



110-120 



200-220 



230-250 



230-245 



240-260 



180-195 



145-155 



180-200 



130-140 



190-210 



230-270 



170-190 



160-180 



130-150 



160-180 



120-150 



170-180 



260-280 



200-220 



200-215 



56 



56 



54 



56 



56 



52 



56 



56 



52 



56 



54 



54 



56 



54 



56 



52 



52 



52 



54 



54 
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Table 1 Characteristics of 74 polymorphic microsatellite loci developed in grasspea (FP = forward primer, RP = reverse 
primer, Ta = annealing temperature) (Continued) 



G76 



G77 



G80 



G81 



G83 



G87 



G101 



G102 



G110 



G116 



G119 



G120 



G123 



G128 



G131 



G133 



G136 



G142 



G143 



G145 



G147 



G150 



(ACA)6 
(AAQ11 
(A AC) 5 
(ACG)5 
(ACG)7 
(AG) 15 
(CA)11c(CA)7 
(CA)12 
(CA)6 
(CA)6(CACACG)5 
(CA)6cgacacacncgcgcgcgcgacacac(ACG)8 
(CA)6cgcacgcacgcacacagacacg(CA)7 
(CA)6gn(AC)6 
(CA)7 

(CA)7aacacgttcg(CA)8 
(CA)7cgcacat(AC)6 
(CA)7tacacacacat(AC)7aa(AC)6 
(CA)8cgcacaa(AC)10 
(CA)8cggcgcgcg(AC)9 
(CA)8tacgcacg(CA)10 
(CA)9 
(CA)9 g(AC)25 



FP-CACAACCAACGCCAATACAG 

RP-CCGTAGTACCGCGCTTATTC 

FP-ACAAGACAACATCACCGAGAC 

RP-TGTOmGGTOTOGTGTA 

FP-AAACACAACAGACGATOAACACA 

RP-TCTOCTATGTAGTGTOTGTGATG 

FP-CGCACACACTCACACACAAC 

RP-GGTCCTGTCGTCGTAGTCCT 

FP-GGGCACACATCTCACACAC 

RP-TGTCGTCGTGTCGTAGTCGT 

FP-CCCmCCGAGTGCAGAAAA 

RP-CACCACGACTOCTCACCTA 

FP-TGGCAGGTAACTGGTGAGTG 

RP-GGTGmCCCCACCTCTCTA 

FP-AAAGCACAGCACAACACGAC 

RP-AACAAGGACGACGGTAGGTG 

FP-CACAAACACGCACAAACACA 

RP-CGTCGGTATAACCGTGTCGT 

FP-CACACAGGACAGCACTCACA 

RP-GTCGTCGGTGTGTCGTAGTC 

FP-CGTCTOTCAAAGGGCCATA 

RP-CGACCGACCGACGTACTACT 

FP-GCGCACGCATACATACACA 

RP-TOCCGTOTCGTGTOGTG 

FP-CATAACAACACGCAGCATOCC 

RP-TOCGTOTOTGTOTGm 

FP-CCACACACCCACATGTCA 

RP-TOTGGTGGGTCTGAGAGTG 

FP-GCGCTCACACCAACATAAAG 

R P-TGTATGCGTGCGTATGTCTG 

FP-ACGCGTGCACACA^ATC 

RP-TATGTGGGCGCGTGTAAGTA 

FP-ACGACGACCACCAGTACGA 

RP-ACGAGTGCGTGTGTGAGTGT 

FP-CGTGCACGCACAGATACG 

RP-GTGTGTGTGTCGTCGmG 

FP-GACACACACAACCCGAACAC 

RP-TGAGCGAACGTACGTGGTAG 

FP-ATACAAGCACGCATCCACAG 

RP-AGTOGTGTCGTGTCGTGTC 

FP-CGTCACACACGTCACGTACA 

RP-CTACGAGACGCACGACGATA 

FP-CACACACACCAAGCGTOCA 

RP-TCGTGTGTGTCGTGTGTGTAG 



230-250 



300-330 



185-200 



180-200 



190-200 



230-250 



180-190 



260-280 



150-170 



150-180 



190-200 



160-170 



130-140 



210-230 



140-150 



200-220 



110-130 



160-U 



230-260 



100-120 



210-230 



140-160 



54 



52 



52 



52 



54 



54 



52 



54 



52 



52 



52 



54 



52 



56 



54 



52 



54 



52 



52 



52 



54 



54 
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Table 1 Characteristics of 74 polymorphic microsatellite loci developed in grasspea (FP = forward primer, RP = reverse 
primer, Ta = annealing temperature) (Continued) 



G151 (CAA)10 

G154 (CAA)5agaccacaacaccaccacc aacaacaacaataataaaacag(AAC)5 

G157 (CAA)6 

G165 (CGA)5 

G171 (CD9 

G174 (GA)19 

G184 (GDI 5 

G185 (GDI 9 

G188 (GD6 

G191 (GD6 

G192 (GD6a(TG)7 

G200 (GD7 

G205 (GD7gcgtgtgcctgcgtctctgcgagtgcgtgc(GD6 

G206 (GD8 

G209 (GD8 

G211 (GD9 

G219 (GTD10 

G225 (GTD7 

G228 a) 10 

G233 (TQ20 

G234 (TQ7 

G244 (TQ6 



FP-CAACAACGACAACAAAATOTAA 

RP-CTGCTGATGTOTOGTGCT 

FP-CTGGCGTAATAGCGAAGAGG 

RP-TGTGTOCmGTGTOTCGT 

FP-ACATCCAATCCCCACCATAA 

RP-AATGCATGGTOTOOTGA 

FP-GAACGTACGACGACACGAACT 

RP-CGTGTGGTGTGTGTGTGTGT 

FP-OTCACTGCATGCmCCAC 

RP-CTGGGGTGG^mGTCAGT 

FP-CACAAGGGTCAAGGGAGAGA 

RP-GmACGTOCTOTCGTOGTOG 

FP-GCGTGTGTGCGAATGTGT 

RP-CACGCACGCACACTAGACTAC 

FP-TGCGTGTGTCGCTCTATCAT 

RP-TACTGCGACAACCGAACGTA 

FP-GCGCGTOGTGTGTGmGA 

RP-CACGCACGCACACTOCATA 

FP-TGTGCGTGGTGmGAGTG 

RP-CACATACGCACAGCCCATAC 

FP-TGCGTGATAAGGTGOTGAG 

RP-ACACACACACGCACACACAC 

FP-GGATGGTGTGCTGTGTGTGT 

RP-AACACCAACTACCGGCAACT 

FP-TGTCTGGTGTGTGTGGTGTG 

RP-CGACACGTACGCAACGAC 

FP-AAACTGGCCCTGCA^C 

RP-GGTCATGGCAAmGAGACA 

FP-mGCACGTGTCCTGTGm 

RP-ACGACGACCACACACCACTA 

FP-ATGGCGTCGTATGTGTGTGT 

RP-GTOCGGCCGAATCAACAAC 

FP-CCAGTOTGCCGAACACAT 

RP-CCAACAGCAGATOCCAGTA 

FP-GGGCAGTGGACCAGTOGAG 

RP-CCGAGGGAATAAACGACAAA 

FP-CCTACGGACATGCCTG^ 

RP-GCGGTAGGGGAAAAACAACT 

FP-CGTCGTCOTCTCCTCCTA 

RP-AGACGACTACGGACGACGAC 

FP-GTOGGmGGCATOAACT 

RP-GAAGGGGCGAACAAATAAAA 

FP-CAATCCGAAAATCACCACCT 

RP-GCACTCACATGCACACAAAC 



175-185 



240-260 



210-230 



270-290 



200-230 



140-160 



180-190 



130-135 



140-150 



140-160 



160-170 



120-135 



230-250 



190-210 



240-260 



200-210 



130-160 



250-270 



280-310 



120-140 



190-210 



230-250 



52 



56 



56 



54 



52 



52 



52 



56 



52 



52 



56 



52 



52 



52 



52 



52 



52 



52 



52 



52 



52 



52 
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Table 1 Characteristics of 74 polymorphic microsatellite loci developed in grasspea (FP = forward primer, RP = reverse 
primer, Ta = annealing temperature) (Continued) 



G245 


(TQ6 


FP-CGTOGTOTOGTCGGTCA 
RP-GAACGAAACAACGACGACAA 


240-260 


52 


G249 


(JG)6c(G^7 


FP-TATGTGTGCAACGGCAAC^ 
RP-GCACACCCACCACACAATAG 


140-160 


52 


G254 


(TQ7 


FP-TGAGTGCGTACGTGTGTCTG 
RP-GCGCGTGTCACACATAGAC 


100-120 


52 


G262 


(TGGT)5 


FP-TGTGCGTGTGTGTG 1 1 1 1 IG 
RP-ACCACAACCCCTACCCTACC 


300-320 


52 


G268 


(TG^5tattn(TO)6 


FP-TTGmGTTGTTGTTGTGTOTG 
RP-CTACAGTACAGACCCGCCACT 


290-305 


52 


G269 


(TG^6 


FP-ATGCTGTOATGCGTCAG^ 
RP-TGCAGCAACAACAAATAAGACA 


220-240 


52 


G273 


(TG^7 


FP- 1 1 1 1 IGGTATO^GTOTCGT 
RP- CTGCAGCAATAACAGCATCAG 


250-270 


52 


G284 


(JTG)6 


FP-TGTGTOTGTOTGCTGTATGTA 
RP-GCAGCAACATOAAACGAACAG 


160-170 


52 


G285 


(JTG)6 


FP-mGTGCGGTOATGTO^ 
RP-CTACGTCAGCCCGTCATACC 


190-200 


52 




o 
o 




o 
o 

o - 

o 

m 



100 200 
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Figure 1 Size distribution of 454 reads. 
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Table 2 MISA result in the genome survey 


Category 


Numbers 


Total number of sequences examined 


280,791 


Total size of examined sequences (bp) 


130,484,900 


Total number of identified SSRs 


651,827 


Number of SSR containing sequences 


129,886 


Number of sequences containing more than one SSRs 


115,172 


Number of SSRs present in compound formation 


464,271 



Primer pairs designing 

A total of 62,342 primer pairs flanking the SSRs were suc- 
cessfully designed using the public shareware Primer 3.0 
(http://www-genome.wi.mit.edu/genome_software/other/ 
primer3.html), based on criteria of melting temperature, 
GC content and the lack of secondary structure. Further- 
more, 50,144 non-redundant primers were achieved by in 
house developed programs (Additional file 8: Table S2). 

Validation of SSR markers 

To validate the SSR sequences, 288 SSR primer pairs 
were randomly selected for PCR amplification for size 
polymorphism among 23 grasspea (L. sativus) genotypes 
from diverse geographical locations and one red pea 
(L. cicera) genotype. After two rounds of PCR amplifi- 
cations, 74 primer pairs were confirmed of being able 



to amplify polymorphic based across the 24 genotypes 
(Table 1), 70 primer pairs were confirmed to amplify 
only monomorphic fragments, and 144 primer pairs 
produced no products. The number of observed alleles 
(Na) ranged from two to five, the observed heterozygosity 
(Ho) from 0 to 0.9545, and Shannons information index 
(I) ranged from 0.1013 to 1.0980 (Table 3). These results 
indicate the broad utility of the SSR markers obtained 
from next-generation sequencing for future studies of 
grasspea genetics. 

Genetic diversity study 

To assess the efficiency of microsatellites for differenti- 
ation of L. sativus from other Lathyrus species, we chose 
one L. cicera accession (ELS 0246, Syria) as outgroup in 
the genetic diversity study. Cluster analysis based on Nei s 
[25] genetic distance indicated good separation between L. 
sativus and L. cicera. Furthermore, the UPGMA proced- 
ure grouped most Chinese accessions into one cluster; 
come from the center of origin, Mediterranean accessions 
discovered the major genetic diversity in cultivated grass- 
pea species as they spread allover, except Chinese cluster 
(Figure 4). These results absolutely validated the accuracy 
and effectiveness of our approach for developing SSR 
markers in grasspea with the NGS technology. 
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Figure 2 Distribution of SSR motif start position. 
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Discussion 

Grasspea as a potential vital crop in arid areas 

Frequent drought and water shortage are worldwide 
problems, especially for agricultural production. Dryland 
agriculture plays an important role in national economy 
and food security. For example, in China, 55% of the 
total arable land, and 43% of the total food supplies are 
related to dryland agriculture. Grasspea is popular 
among the resource poor farmers in marginal areas due 
to the ease with which it can be grown successfully 
under adverse agro-climatic conditions without much 
production inputs. Presently at global scale, it is grown 
on 1.5 million ha area with 1.2 million tonnes produc- 
tion [2]. In recent years, efforts are underway in many 
countries including China, Australia, Spain, Italy, and 
Canada to expand its cultivation as a break crop be- 
tween cereals and as a bonus crop in fallow land be- 
cause of its ability to fix large amount of atmospheric 
nitrogen in association with Rhizobium bacteria [7]. 
However, the presence of a neurotoxin, p-N-Oxalyl-L- 
a,(3-diaminopropionic acid (p-ODAP), renders this crop 
neglected and underutilized. Despite the undesirable 
features such as high neurotoxin, grasspea has potential 
as an important crop in western China and other arid 
areas in the world. 



Mining genomic SSR loci using 454 pyrosequencing 
technology 

The traditional methods of microsatellite development 
used a library-based approach for targeted SSR repeat 
motifs, which was time consuming, expensive, with low- 
throughput. Hunting in silico for EST-SSRs from public 
database method is an alternative way, which was cost 
effective and easy to access. However, the total number 
of ESTs from grasspea and related species was very 



limited since grasspea has received less attention for 
molecular studies. 

The identification of SSRs from genomic DNA using 
the 454 pyrosequencing technology was relatively new 
and two strategies were published. These were shotgun 
sequencing [16-18] and SSR-enriched sequencing [15,19]. 
In the present study, we used SSR-enriched sequencing 
technology and generated 370,079 high quality grasspea 
genomic reads, with an average length of 453 bp. Theoret- 
ically, the longer reads would increase our chances of suc- 
cessfully designing primer pairs while making it possible 
to identify long SSR repeats comparable to the size ob- 
tained using traditional library-based approach [18,27]. 
According to the MISA analysis, 651,827 SSRs were iden- 
tified from 129,886 reads. This was a very positive result, 
as the high ratio of SSR-containing reads and the large 
number of putative SSRs we obtained. Among them, di- 
and tri-nucleotide repeat motifs dominated the grasspea 
genomic sequences, similar to findings in other crops [28] . 
(AC/GT)n was not only the predominant di-nucleotide 
repeat motif, but also the most frequent motif in the entire 
genome, accounting for 55.2% of the total SSRs, followed 
by (AAC/GTT)n, (AG/CT)n, (ACG/CTG)n, while, (AT/TA) 
n, (CG/GC)n, (CCG/CGG)n were rarely detected in this 
study. The pattern was moderately similar to that previ- 
ously observed in faba bean [15]. Furthermore, isolated 
and identified low proportion of unwanted repeat motifs 
such as (AT/TA)n, (CG/GC)n, (CCG/CGG)n would en- 
hance the success ratio in designing primers. 

Utilization of new SSR resources for 'orphan crop' 
grasspea research 

Conventional breeding and phenotype research achieved 
great progress in improving agricultural crops in the last 
few years. However, grasspea was left as orphan crop' due 
to the lack of available genetic and genomic resources 
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Table 3 Results of initial primer screening through 24 
diversified accessions in Lathyrus 
Primer pair ID 





Na 1 


Ho 2 


I 3 


G1 


3 


0.4211 


0.8258 


G4 


2 


0.0000 


0.1914 


G5 


2 


0.2381 


0.5196 


G6 


2 


0.5000 


0.5623 


G7 


2 


0.0000 


0.1849 


G9 


3 


0.8750 


0.7691 


G13 


2 


0.1176 


0.5456 


G15 


2 


0.0714 


0.1541 


G17 


2 


0.2000 


0.3251 


G18 


4 


0.1500 


0.5086 


G27 


4 


0.0870 


0.7216 


G33 


3 


0.5556 


0.7086 


G39 


3 


0.3846 


0.7436 


G49 


5 


0.9545 


1.0691 


G61 


4 


0.7143 


0.9592 


G64 


3 


0.1429 


1 .0346 


G67 


5 


0.6250 


1 .0782 


G68 


2 


0.9091 


0.6890 


G72 


2 


0.0000 


0.2146 


G73 


3 


0.0667 


0.7689 


G75 


2 


0.3478 


0.4620 


G76 


4 


0.5714 


1 .0980 


G77 


3 


0.4737 


0.8011 


G80 


2 


0.0000 


0.1849 


G81 


4 


0.6818 


0.9351 


G83 


2 


0.0000 


0.2712 


G87 


3 


0.1111 


0.4258 


G101 


3 


0.1500 


0.3141 


G102 


2 


0.0000 


0.3768 


G110 


2 


0.5500 


0.6819 


G116 


3 


0.0000 


0.4634 


G119 


2 


0.0526 


0.2762 


G120 


2 


0.0556 


0.1269 


G123 


3 


0.0435 


0.2090 


G128 


2 


0.2500 


0.6919 


G131 


3 


0.2609 


0.4776 


G133 


3 


0.6190 


0.7920 


G136 


2 


0.6667 


0.6365 


G142 


2 


0.6000 


0.6730 


G143 


2 


0.6667 


0.6365 


G145 


2 


0.6154 


0.6172 


G147 


3 


0.3571 


0.7401 



Table 3 Results of initial primer screening through 24 
diversified accessions in Lathyrus (Continued) 



G150 


2 


0.4286 


0.5196 


G151 


2 


0.3000 


0.4227 


G154 


4 


0.5000 


1.0251 


G157 


2 


0.3889 


0.6792 


G165 


2 


0.8095 


0.6749 


G171 


3 


0.0000 


0.4634 


G174 


2 


0.1667 


0.4029 


G184 


2 


0.0000 


0.3622 


G185 


2 


0.0417 


0.1013 


G188 


3 


0.0667 


0.5627 


G191 


2 


0.2500 


0.6616 


G192 


2 


0.8667 


0.6931 


G200 


4 


0.9000 


0.9386 


G205 


2 


0.0000 


0.6172 


G206 


2 


0.0000 


0.2868 


G209 


2 


0.5000 


0.5623 


G211 


2 


0.0000 


0.3768 


G219 


3 


0.7500 


0.9881 


G225 


2 


0.4348 


0.5236 


G228 


2 


0.1176 


0.3622 


G233 


3 


0.2000 


0.5627 


G234 


2 


0.1250 


0.4826 


G244 


3 


0.9167 


0.9222 


G245 


2 


0.3500 


0.4637 


G249 


2 


0.5294 


0.5779 


G254 


3 


0.0909 


0.3558 


G262 


2 


0.0000 


0.4506 


G268 


2 


0.0000 


0.1732 


G269 


2 


0.0000 


0.6365 


G273 


2 


0.0000 


0.1732 


G284 


2 


0.0000 


0.2237 


G285 


2 


0.1739 


0.2954 



^he number of observed alleles. 

Estimated proportion of observed heterozygosity under random mating using 
Nei's (1978) unbiased heterozygosity. 
Shannon's Information index (Lewontin, 1972). 



[29]. The use of SSR markers as a conventional tool has 
played an important role in the study of genetic diversity, 
genetic linkage map, QTL mapping and association 
mapping, and paved the way to the integration of 
genomics for crop breeding. 

Due to the scarcity of user-friendly, highly polymorphic 
molecular markers in grasspea and other Lathyrus species, 
high-density genetic maps were not available. In the pre- 
sent study, we validated 288 non-redundant SSR primer 
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Figure 4 UPGMA dendrogram of 24 germplasm resources. 
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pairs and 144 (50.0%) SSR primer pairs produced amplified 
bands, with 74 being polymorphic, and 70 monomorphic. 
This very large set of potential genomic-SSR markers will 
facilitate the construction of high-resolution maps for 
positional cloning and QTL mapping. 

The genus Lathyrus L. (Fabaceae) is consisted of about 
160 species [30] distributed throughout the temperate 
regions of the northern hemisphere and extends into 
tropical East Africa and South America [31,32]. This 
study, we used 74 new SSR primer pairs to clearly separ- 
ate the 23 L. sativus accessions from one L. cicera acces- 
sion, which is in agreement with the reported phylogenic 
studies of Lathyrus L. (Fabaceae) based on morpho- 
logical and molecular markers [7,31]. 

Conclusion 

This study provides an extensive characterization of the 
SSRs in grasspea genome. For the first time, large-scale 
SSR-enriched sequence data was generated for the identi- 
fication of SSRs and development of SSR markers to accel- 
erate basic and applied genomics research in grasspea. 
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